12. Problem Set
Problem Set 1 - Relative Probabilities
Relative Probabilities¶
# just a helper function for easier youtube call
def strip_url(url):
return url.replace('https://youtu.be/','')
from IPython.display import YouTubeVideo
url = 'https://youtu.be/kmqfftmc6hE'
YouTubeVideo(strip_url(url))
Ans¶
$$ \textstyle \begin{array}{l} \dfrac {p(Exactly\ One\ Head)}{p(First\ flip\ is\ Only\ Head)}\\ \\ \text {Total No of flips } = 4 \\ \\ \text {Total No of outcomes} = 2^4 = 16 \\ \\ \end{array} $$
It is tedious to write out all 16 outcomes or sketch a tree, so we could only focus on favourable outcomes (other outcomes only needed for total count)
Exactly One Head:
F1 | F2 | F3 | F4 |
---|---|---|---|
H | T | T | T |
T | H | T | T |
T | T | H | T |
T | T | T | H |
First Flip is Only Head:
F1 | F2 | F3 | F4 |
---|---|---|---|
H | T | T | T |
$$ \textstyle \therefore \dfrac { \dfrac {4}{16}}{ \dfrac {1}{16}} = 4 $$
Relative Probabilities 2¶
url = 'https://youtu.be/Ug-2cy_BNVo'
YouTubeVideo(strip_url(url))
First Flip is Heads:
F1 | F2 | F3 | F4 |
---|---|---|---|
H | H | T | T |
H | T | H | T |
H | T | T | H |
H | T | T | T |
H | T | H | H |
H | H | T | H |
H | H | H | T |
H | H | H | H |
8 favourable outcomes for denominator. So
$$ \dfrac { \dfrac {4}{16}}{ \dfrac {8}{16}} = \dfrac {4}{8} = 0.5 $$
Same Coin¶
This was tough.
url = 'https://youtu.be/YQhStz2eHYY'
YouTubeVideo(strip_url(url))
from graphviz import Digraph
from helpers import draw_samecoin_graph, update_samecoin_graph
g = Digraph()
g = draw_samecoin_graph(g) # hardcoded for this problem now
# Root - Choosing between Fair or Lodged Coin
g = update_samecoin_graph(g, highlight_nodes=['Root'],color='#33EA4C:#A2E9FF')
# Fair Coin - Choosing between Person 1 (HHT) or Person 2 (THH)
g = update_samecoin_graph(g, highlight_nodes=['F'],color='#33EA4C:#A2E9FF')
# Fair Coin - HHT - Person 1 - Me - M
highlight_nodes = ['H1','H3','T7']
g = update_samecoin_graph(g, highlight_nodes=highlight_nodes,color='#33EA4C')
# Fair Coin - THH - Person 2 - You - Y
highlight_nodes = ['T1','H4','H9']
g = update_samecoin_graph(g, highlight_nodes=highlight_nodes,color='#A2E9FF')
# Loaded Coin - Choosing between Person 1 (HHT) or Person 2 (THH)
g = update_samecoin_graph(g, highlight_nodes=['L'],color='#33EA4C:#A2E9FF')
# Loaded Coin - HHT - Person 1 - Me - M
highlight_nodes = ['H2','H5','T11']
g = update_samecoin_graph(g, highlight_nodes=highlight_nodes,color='#33EA4C')
# Loaded Coin - THH - Person 2 - You - Y
highlight_nodes = ['T2','H6','H13']
g = update_samecoin_graph(g, highlight_nodes=highlight_nodes,color='#A2E9FF')
g
Green flow indicates Person 1 picking up a Coin (Fair or Loaded) and flipping it thrice
Blue flow indicates Person 2 picking up a Coin (Fair or Loaded) and flipping it thrice
Person 1 always gets the order HHT so HHT characterizes Person 1 Person 2 always gets the order THH so THH characterizes Person 2
Let us denote Person 1 by M (Me), and Person 2 by Y (You)
Thus, $p(HHT)$ is same as $p(M)$ and $p(THH)$ is same as $p(Y)$. With this understanding, (closely follow above diagram for below calculations)
$$
\textstyle
\begin{array}{l}
p(\ F\ \cap\ H\ \cap\ H\ \cap\ T\ ) = p(\ F\ \cap\ HHT\ ) = p(\ F\ \cap\ M\ ) \\ \\
\text {Also} \\ \\
p(\ F\ \cap\ H\ \cap\ H\ \cap\ T\ ) = p(\ F\ )p(\ H\ )p(\ H\ )p(\ T\ ) = (0.5)^4 = 0.0625\\ \\
\therefore\ p(\ F\ \cap\ M\ ) = 0.0625 \\ \\
\text {Similarly} \\ \\
p(\ F\ \cap\ T\ \cap\ H\ \cap\ H\ ) = p(\ F\ )p(\ T\ )p(\ H\ )p(\ H\ ) = (0.5)^4 = 0.0625\\ \\
\therefore\ p(\ F\ \cap\ Y\ ) = 0.0625 \\ \\
p(\ L\ \cap\ H\ \cap\ H\ \cap\ T\ ) = p(\ L\ )p(\ H\ )p(\ H\ )p(\ T\ ) = (0.5)(0.9)^2(0.1) = 0.0405\\ \\
\therefore\ p(\ L\ \cap\ M\ ) = 0.0405 \\ \\
p(\ L\ \cap\ T\ \cap\ H\ \cap\ H\ ) = p(\ L\ )p(\ T\ )p(\ H\ )p(\ H\ ) = (0.5)(0.1)(0.9)^2 = 0.0405\\ \\
\therefore\ p(\ L\ \cap\ Y\ ) = 0.0405 \\ \\
\text {Summarizing..} \\ \\
p(\ F\ \cap\ M\ ) = 0.0625 \\
p(\ F\ \cap\ Y\ ) = 0.0625 \\
p(\ L\ \cap\ M\ ) = 0.0405 \\
p(\ L\ \cap\ Y\ ) = 0.0405 \\ \\
\end{array}
$$
Applying Bayes Theorem,
$$
\textstyle
\begin{array}{l}
p(\ F\ |\ M\ ) = \dfrac {p(\ F\ \cap\ M\ )}{\sum p(\ M\ )} = \dfrac {p(\ F\ \cap\ M\ )}{ p(\ F\ \cap\ M\ ) + p(\ L\ \cap\ M\ )} \\ \\
= \dfrac {0.0625}{0.0625 + 0.0405} = 0.6068 \\ \\
p(\ F\ |\ Y\ ) = \dfrac {p(\ F\ \cap\ Y\ )}{\sum p(\ Y\ )} = \dfrac {p(\ F\ \cap\ Y\ )}{ p(\ F\ \cap\ Y\ ) + p(\ L\ \cap\ Y\ )} \\ \\
= \dfrac {0.0625}{0.0625 + 0.0405} = 0.6068 \\ \\
p(\ L\ |\ M\ ) = \dfrac {p(\ L\ \cap\ M\ )}{\sum p(\ M\ )} = \dfrac {p(\ L\ \cap\ M\ )}{ p(\ L\ \cap\ M\ ) + p(\ L\ \cap\ M\ )} \\ \\
= \dfrac {0.0405}{0.0625 + 0.0405} = 0.3932 \\ \\
p(\ L\ |\ Y\ ) = \dfrac {p(\ L\ \cap\ Y\ )}{\sum p(\ Y\ )} = \dfrac {p(\ L\ \cap\ Y\ )}{ p(\ F\ \cap\ Y\ ) + p(\ L\ \cap\ Y\ )} \\ \\
= \dfrac {0.0405}{0.0625 + 0.0405} = 0.3932 \\ \\ \\
\text {Summarizing..} \\ \\
p(\ F\ |\ M\ ) = 0.6068 \\
p(\ F\ |\ Y\ ) = 0.6068 \\
p(\ L\ |\ M\ ) = 0.3932 \\
p(\ L\ |\ Y\ ) = 0.3932 \\
\end{array}
$$
Question: What is the probability that we flipped coins of the same type?¶
There are 2 possible cases of we both having same type of coin, and they are independent of each other.
- Both have fair coins and initiate our flip chain - case A
- Both have loaded coins and initiate our flip chain - case B
Since 2 cases, probability of either case happening is simply their sum (as they are independent as well)
$$ p(\ A\ \cup\ B\ ) = p(\ A\ ) + p(\ B\ ) $$
$p(\ A\ ) $:
Both have fair coins and initiate our flip chain. That is,
- Given its me or HHT, coin being F, that is, $p(\ F\ |\ M\ )$ - case A1
- Given its you or THH, coin being F, that is, $p(\ F\ |\ Y\ )$ - case A2
We need both to be same type, in the sense, both case A1 and A2 should happen. Thus
$$ p(\ A\ ) = p(\ A1\ \cap A2\ ) $$
A1 and A2 are also independent to each other. We do not have influence on each other. So,..
$$ p(\ A\ ) = p(\ A1\ \cap A2\ ) = p(\ A1\ )p(\ A2\ ) = p(\ F\ |\ M\ )p(\ F\ |\ Y\ ) = (0.6068)(0.6068) = 0.3682 $$
$p(\ B\ ) $:
Both have loaded coins and initiate our flip chain. That is,
- Given its me or HHT, coin being L, that is, $p(\ L\ |\ M\ )$ - case B1
- Given its you or THH, coin being L, that is, $p(\ L\ |\ Y\ )$ - case B2
We need both to be same type, in the sense, both case B1 and B2 should happen. Thus
$$ p(\ B\ ) = p(\ B1\ \cap B2\ ) $$
B1 and B2 are also independent to each other. We do not have influence on each other. So,..
$$ p(\ B\ ) = p(\ B1\ \cap B2\ ) = p(\ B1\ )p(\ B2\ ) = p(\ L\ |\ M\ )p(\ L\ |\ Y\ ) = (0.3932)(0.3932) = 0.1546 $$
Finally
$$ p(\ A\ \cup\ B\ ) = p(\ A\ ) + p(\ B\ ) = 0.3682 + 0.1546 = 0.523 $$
The next set of questions are complicated, so we will see in separate section next
Problem Set 2 - Many Flips
Many Flips¶
This was such a tricky question that it needed a deep dive in to the concept before trying to answer the questions.
Note: I have created this with emphasis on visualizing, so as to understand bigger picture better. However due to lack of time, I have not sufficiently modularized, please ignore the code, if its is daunting or unreadable, it is not essential and focus only on output as the target here is mainly to understand the concepts.
# just a helper function for easier youtube call
def strip_url(url):
return url.replace('https://youtu.be/','')
from IPython.display import YouTubeVideo
url = 'https://youtu.be/MwcRx5uXICM'
YouTubeVideo(strip_url(url))
3 Flips¶
Let us visualize and comprehend first the probability concept for 3 flips.
from graphviz import Digraph
from coinflipviz import draw_graph, get_combinations, get_combinations_consolidated, plot_combinations_consolidated
n_flips = 3
g = Digraph()
g = draw_graph(g, n_flips)
g
Each flow from Root to end, is a possibility of a sequence. For eg, on left most of tree, we have a sequence HHH, 3 heads in all 3 flips. This terminology is important. Each sequence may have any number of heads or nothing at all. Let us check, what are all the possible sequences from 3 flips and no of heads in each of those sequences.
combi_df = get_combinations(n_flips)
combi_df
As you might have guessed, the no of sequence depends on no of flips n.
$$ \text {No of sequences} = 2^n , \text {where n is no of flips} $$
Let us check, the statistics of how many heads occur in each sequence, and count them.
final_df = get_combinations_consolidated(n_flips)
final_df
So for 3 flips, we have a distribution of no of heads in sequences, and thus their associated probability as well. If we try to graph them, we get something like below.
plot_combinations_consolidated(final_df)
It is important to understand the above graph very well. For instance, on LHS, when X=1, it means, there are 3 sequences with 1 head, and we could calculate its probability as $\dfrac {3}{2^3}=0.375$ which is shown on RHS. Note the nature of the graph. Let us try again for 4 flips.
4 Flips¶
n_flips = 4
g = Digraph()
g = draw_graph(g, n_flips)
g
combi_df = get_combinations(n_flips)
combi_df
final_df = get_combinations_consolidated(n_flips)
final_df
plot_combinations_consolidated(final_df)
Now you see, where the graph goes as n increases? This is binomial distribution. Let us put aside formula and definition for a while and only focus on the nature of these graphs.
What happens when flips increase?¶
Let us find out by a small animation
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from coinflipviz import autolabel, autoformat
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))
plt.close()
def animate_combinations_consolidated(df, fontsize=10):
"""
Given the dataframe with x, n(x), p(x) this provides two plots:
x vs n(x)
x vs p(x)
"""
X = df['x'].tolist()
N = df['n(x)'].tolist()
P = df['p(x)'].tolist()
rects = ax1.bar(X, N)
autolabel(ax1, rects, fontsize)
xlabel = 'No of Heads'
ylabel = 'No of Sequences\nhaving those no of Heads'
autoformat(ax1, xlabel, ylabel, fontsize)
rects = ax2.bar(X, P)
autolabel(ax2, rects, fontsize)
ylabel = 'Probability of ANY of Sequences\nhaving those no of Heads'
autoformat(ax2, xlabel, ylabel, fontsize)
def animate(i):
ax1.clear()
ax2.clear()
fig.suptitle('No of flips : {}'.format(i), fontsize=14)
temp_df = get_combinations_consolidated(i)
animate_combinations_consolidated(temp_df)
n_flips = 10
ani = animation.FuncAnimation(fig, animate, np.arange(1, n_flips+1), interval=1000)
plt.tight_layout()
plt.subplots_adjust(wspace=0.5)
from IPython.display import HTML
HTML(ani.to_html5_video())
- Note that as we increase the no of flips, the graphs above approximate to a Guassian like structure.
- Maximum number of heads is always for half the number of flips. That is, if n = 10, then x = 5, that is sequence having 5 heads, has maximum probability. This is well observed, when n=even
The above graphical depictions would be very helpful to answer the quizes as we would see below.
Pascal's triangle¶
The numbers on the LHS, the n(x), interestingly are the same as in Pascal's triangle, where the row corresponds to the no of flips.
These numbers are called binomial coefficients. That is, the No of sequences we get on LHS, are binomial co efficients. Pascal's triangle is made up of binomial co oefficients.
Quiz Answers¶
1. The Probability of every Sequence becomes Small - True or False?¶
Let us take an example. Say, no of flips = 4, Then
$$ \text {Total No of sequences} = 2^4 = 16 \\ \\ \text {Probability of any single sequence} = \dfrac {\text {No of that sequence}}{\text {Total No of sequences}} = \dfrac {1}{16} = 0.0625 $$
Generalizing, for any no of flips n, $$ \text {Total No of sequences} = 2^n \\ \\ \text {Probability of any single sequence} = \dfrac {\text {No of that sequence}}{\text {Total No of sequences}} = \dfrac {1}{2^n} $$
Obviously, as n increases, $\dfrac {1}{2^n}$ decreases.
2. The probability of every number of heads becomes small - True or False?¶
Let no of flips = 4, Let us take specific number of heads, say, X = 2 (3 heads). Let us observe, what happens to X = 2, when n increases in below animation directly
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from coinflipviz import autolabel, autoformat
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))
plt.close()
def animate_combinations_consolidated(df, fontsize=10):
"""
Given the dataframe with x, n(x), p(x) this provides two plots:
x vs n(x)
x vs p(x)
"""
X = df['x'].tolist()
N = df['n(x)'].tolist()
P = df['p(x)'].tolist()
rects = ax1.bar(X, N)
autolabel(ax1, rects, fontsize)
xlabel = 'No of Heads'
ylabel = 'No of Sequences\nhaving those no of Heads'
autoformat(ax1, xlabel, ylabel, fontsize)
rects = ax2.bar(X, P)
autolabel(ax2, rects, fontsize)
# highlight X = 2
if len(rects) > 2:
rects[2].set_color('r')
ylabel = 'Probability of ANY of Sequences\nhaving those no of Heads'
autoformat(ax2, xlabel, ylabel, fontsize)
def animate(i):
ax1.clear()
ax2.clear()
fig.suptitle('No of flips : {} Red : (X=2)'.format(i), fontsize=14)
temp_df = get_combinations_consolidated(i)
animate_combinations_consolidated(temp_df)
n_flips = 10
ani = animation.FuncAnimation(fig, animate, np.arange(1, n_flips+1), interval=1000)
plt.tight_layout()
plt.subplots_adjust(wspace=0.5)
from IPython.display import HTML
HTML(ani.to_html5_video())
So even though, initially it increased from 0.25 (for n=2), to 0.375 (n=3), thereafter it reduces and approaches to zero as can be seen on above animation.
3. The probability of every proportion of heads becomes Small. True or False?¶
Let n = 4,
Let X = 50% of n.¶
Now this does not make sense for odd no of n, because then X would be non integer. For eg, if I have 5 flips, then X cannot be 2.5 heads! So we will check for only even no of flips.
Note:¶
Graph's y axis limits are maintained at constant, just to illustrate the reducing nature of probabilities. This had been the case all along, only that earlier, the y axis was auto scaling so reducing nature was only obvious from carefully observing numbers.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from coinflipviz import autolabel, autoformat
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))
plt.close()
def animate_combinations_consolidated(i, df, fontsize=10):
"""
Given the dataframe with x, n(x), p(x) this provides two plots:
x vs n(x)
x vs p(x)
"""
X = df['x'].tolist()
N = df['n(x)'].tolist()
P = df['p(x)'].tolist()
rects = ax1.bar(X, N)
autolabel(ax1, rects, fontsize)
xlabel = 'No of Heads'
ylabel = 'No of Sequences\nhaving those no of Heads'
autoformat(ax1, xlabel, ylabel, fontsize)
rects = ax2.bar(X, P)
autolabel(ax2, rects, fontsize)
ylabel = 'Probability of ANY of Sequences\nhaving those no of Heads'
autoformat(ax2, xlabel, ylabel, fontsize)
# highlight X corresponding to 50% of no of flips
if ((i) % 2) == 0: # if only even,
y = int(i/2)
if len(rects) > 2:
#print(y)
rects[y].set_color('r')
# also set y limit to keep the scale constant to view the effect as n increases
ymin = ax2.get_ylim()[0]
ymax = 0.5*1.1 # increase space to insert bar value and assuming we will not cross 0.5 prob ever
ax2.set_ylim([ymin,ymax])
ymin = ax1.get_ylim()[0]
ymax = 260*1.1 # just hardcoding for this example. MODIFY THIS IF YOU CHANGE N_FLIPS
ax1.set_ylim([ymin,ymax])
def animate(i):
ax1.clear()
ax2.clear()
fig.suptitle('No of flips : {} Red X = 50% of No of flips'.format(i), fontsize=14)
temp_df = get_combinations_consolidated(i)
animate_combinations_consolidated(i, temp_df)
n_flips = 10
ani = animation.FuncAnimation(fig, animate, np.arange(1, n_flips+1), interval=1000)
plt.tight_layout()
plt.subplots_adjust(wspace=0.5)
from IPython.display import HTML
HTML(ani.to_html5_video())
From RHS graph, one could easily observe that the maximum probability that could occur (at $X = \dfrac {n}{2}$) for any n decreases, as n increases.
Since the max probability is reducing, it is also obvious, for any other X it would still reduce, which can also be observed from other adjacent blue bars from above RHS graph.
4. The probability of every range of proportions becomes small - True or False?¶
We will prove this by taking the exceptional case, where it is False. This is difficult to observe at first, as there are wide range of proportions possible, but once we understand the nature of the "Gaussian" like curve of probabilities, it would be easier to comprehend.
Note the phrase range of proportions. This means, from any % of n to any other % of n. The exceptional case we take up to argue its falseness is,
$X \leq 50\%$ of n¶
Suppose, n = 5, then $X \leq \dfrac {5}{2} = X \leq 2.5$, thus X = 0, 1, 2
We should then find
$$ p(X \leq 2.5) = p(X = 0) + p(X = 1) + p(X = 2) $$
Note also that, for n = even, we cannot accurately divide 50%, as a fair share of probability goes to middle X=50% case (just like we saw earlier), so for this case, we will take only X = odd to prove our point.
This partiality does not matter, as n tends to go infinity, we get a smooth guassian like density function, and we just take area for range of X, instead of individual bars as of now.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from coinflipviz import autolabel, autoformat
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))
plt.close()
def animate_combinations_consolidated(i, df, fontsize=10):
"""
Given the dataframe with x, n(x), p(x) this provides two plots:
x vs n(x)
x vs p(x)
"""
X = df['x'].tolist()
N = df['n(x)'].tolist()
P = df['p(x)'].tolist()
rects = ax1.bar(X, N)
autolabel(ax1, rects, fontsize)
xlabel = 'No of Heads'
ylabel = 'No of Sequences\nhaving those no of Heads'
autoformat(ax1, xlabel, ylabel, fontsize)
rects = ax2.bar(X, P)
autolabel(ax2, rects, fontsize)
ylabel = 'Probability of ANY of Sequences\nhaving those no of Heads'
autoformat(ax2, xlabel, ylabel, fontsize)
# highlight X corresponding to 50% of no of flips
if (i % 2) == 0:
pass
else: # if only odd
from math import floor
i_50_range = int(i/2)
for each_bar in range(0, i_50_range+1):
rects[each_bar].set_color('r')
ax2.text(-1,0.52,' Red Area: 0.5', fontsize=12)
# also set y limit to keep the scale constant to view the effect as n increases
ymin = ax2.get_ylim()[0]
ymax = 0.5*1.1 # increase space to insert bar value and assuming we will not cross 0.5 prob ever
ax2.set_ylim([ymin,ymax])
ymin = ax1.get_ylim()[0]
ymax = 260*1.1 # just hardcoding for this example. MODIFY THIS IF YOU CHANGE N_FLIPS
ax1.set_ylim([ymin,ymax])
def animate(i):
ax1.clear()
ax2.clear()
fig.suptitle('No of flips : {} Red X <= 50% of No of flips'.format(i), fontsize=14)
temp_df = get_combinations_consolidated(i)
animate_combinations_consolidated(i, temp_df)
n_flips = 10
ani = animation.FuncAnimation(fig, animate, np.arange(1, n_flips+1), interval=1000)
plt.tight_layout()
plt.subplots_adjust(wspace=0.5)
from IPython.display import HTML
HTML(ani.to_html5_video())
From RHS graph, pause for every odd no of flips, and try calculating sum of red bar values. It would always be 0.5.
$ n = 1,\ X \leq (1)(0.5),\ p(X \leq 0)\ = \ p(X = 0) \ = \ 0.5 \\ \\ n = 3,\ X \leq (3)(0.5), \ p(X \leq 1)\ = \ p(X = 0) \ + p(X = 1)\ = \ 0.125 + 0.375 = 0.5 \\ \\ n = 5,\ X \leq (5)(0.5), \ p(X \leq 2)\ = \ p(X = 0) \ + p(X = 1)\ + p(X=2)\ = \ 0.03125+0.15625+0.31250 = 0.5 \\ \\ n = 7,\ X \leq (7)(0.5), \ p(X \leq 3)\ = \ p(X = 0) \ + p(X = 1)\ + p(X=2)\ + p(X=3)\ = \ 0.00781+0.05468+0.16406+0.2734375 = 0.5 \\ \\ n = 9,\ X \leq (9)(0.5), \ p(X \leq 4)\ = \ p(X = 0) \ + p(X = 1)\ + p(X=2)\ + p(X=3)\ + p(X=4)\ = \ 0.001953+0.017578125+0.0703125+0.1640625+0.24609375= 0.5 \\ \\ $
This is a typical charactersitic of such a distribution. This could be extended that, as n approaches $\infty$, the smooth probability curve we get, always have half of its area to be of 0.5 probability. This makes sense, after all, total probability (total area of the curve) should be 1, and it is symmetrical (as you can see how it is growing), so one could easily infer, half of it would be having 50% probability.
5. The probability of some ranges of proportions becomes small - True or False?¶
What if our range of proportions is itself, less than or more than 50%? Let us find out for 33% (to be inline with goldsong's explanation.
$X \leq 33\%$ of n¶
Earlier we chose odd X, because, we were able to find X the sum of which wholly satisfied our < 50% criteria. In this case, it is tricky. It would be evident, when we plot 33% of n values in a table.
import pandas as pd
cols = ['n_flips', 'n_33_range', 'floored_x', 'ceiled_x']
df = pd.DataFrame(columns=cols)
def yellow(val):
color = 'yellow'
return 'background-color: %s' % color
for i in range(1, 11): # 10 flips
n_flips = i
n_33_range = i*0.33
floored_x = int(i*0.33)
from math import ceil
ceiled_x = int(ceil(i*0.33))
df = df.append({'n_flips' : n_flips, 'n_33_range' : n_33_range, 'floored_x' : floored_x, 'ceiled_x': ceiled_x}, ignore_index=True)
df[['n_flips','floored_x','ceiled_x']] = df[['n_flips','floored_x','ceiled_x']].astype(int)
df = df.astype(object).T # ref: https://stackoverflow.com/questions/47907195/pandas-transpose-resets-decimal-rounding
#list(df.columns.values)
df.style.applymap(yellow, subset=pd.IndexSlice[:,[2,5,8]])
Observe that, when no of flips = 3, 6, 9, etc, as highlighted, we have 33% of n as a whole number (to observe equivalent no of heads).
For example,
When n_flips = 3, n_33_range = 0.99 $\cong$ 1, so X = 1 could almost wholly represent 33% of n_flips
When n_flips = 6, n_33_range = 1.98 $\cong$ 2, so X = 2 could almost wholly represent 33% of n_flips
When n_flips = 9, n_33_range = 2.97 $\cong$ 3, so X = 3 could almost wholly represent 33% of n_flips
However, for other values of n_flips, picking up floor or ceiled value would leave gaps introducing undesired errors that could screw up our observation.
For example,
When n_flips = 2, n_33_range = 0.66, so neither X = 0 or X = 1 is good approximation to be taken for observation.
Also note, ceiled value of n_33_range, serves better than floored value unlike earlier cases.
Visualizing for 33%.. (n_flips increased to 12, to have one more multiple of 3 datapoint)
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from coinflipviz import autolabel, autoformat
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))
plt.close()
def animate_combinations_consolidated(i, df, fontsize=10):
"""
Given the dataframe with x, n(x), p(x) this provides two plots:
x vs n(x)
x vs p(x)
"""
X = df['x'].tolist()
N = df['n(x)'].tolist()
P = df['p(x)'].tolist()
rects = ax1.bar(X, N)
autolabel(ax1, rects, fontsize)
xlabel = 'No of Heads'
ylabel = 'No of Sequences\nhaving those no of Heads'
autoformat(ax1, xlabel, ylabel, fontsize)
rects = ax2.bar(X, P)
autolabel(ax2, rects, fontsize)
ylabel = 'Probability of ANY of Sequences\nhaving those no of Heads'
autoformat(ax2, xlabel, ylabel, fontsize)
# highlight X corresponding to 50% of no of flips
if (i % 3) == 0: # only if multiple of 3 as reasoned earlier
from math import ceil
i_33_range = int(ceil(i*0.33)) # ceiled value serves better in our case
red_area = 0
for each_bar in range(0, i_33_range+1):
rect = rects[each_bar]
rect.set_color('r')
red_area += rect.get_height()
ax2.text(-1,0.52,' Red Area : {}'.format(red_area), fontsize=12)
# also set y limit to keep the scale constant to view the effect as n increases
ymin = ax2.get_ylim()[0]
ymax = 0.5*1.1 # increase space to insert bar value and assuming we will not cross 0.5 prob ever
ax2.set_ylim([ymin,ymax])
ymin = ax1.get_ylim()[0]
ymax = 950*1.1 # just hardcoding for this example. MODIFY THIS IF YOU CHANGE N_FLIPS
ax1.set_ylim([ymin,ymax])
def animate(i):
ax1.clear()
ax2.clear()
fig.suptitle('No of flips : {} Red X <= 33% of No of flips'.format(i), fontsize=14)
temp_df = get_combinations_consolidated(i)
animate_combinations_consolidated(i, temp_df)
n_flips = 12
ani = animation.FuncAnimation(fig, animate, np.arange(1, n_flips+1), interval=1000)
plt.tight_layout()
plt.subplots_adjust(wspace=0.5)
from IPython.display import HTML
HTML(ani.to_html5_video())
Thus, you could observe from graph that,
$ \small { n = 3,\ X \leq (3)(0.33),\ p(X \leq 0.99)\ \cong p(X \leq 1)\ = p(X=0) + p(X=1) = 0.125+0.375 = 0.5 \\ \\ n = 6,\ X \leq (6)(0.33), \ p(X \leq 1.98)\ \cong p(X \leq 2)\ = p(X=0) + p(X=1) + p(X=2) = 0.01562+0.09375+0.234375 = 0.344 \\ \\ n = 9,\ X \leq (9)(0.33), \ p(X \leq 2.97)\ \cong p(X \leq 3)\ = p(X=0) + p(X=1) + p(X=2) +p(X=3) = \\ \\ 0.001953125 + 0.017578125 + 0.0703125 + 0.1640625 = 0.2539 \\ \\ n = 12,\ X \leq (12)(0.33), \ p(X \leq 3.96)\ \cong p(X \leq 4)\ = p(X=0) + p(X=1) + p(X=2) +p(X=3) + p(X=4) = \\ \\ \dfrac {1}{2^{12}} + \dfrac {12}{2^{12}} + \dfrac {66}{2^{12}} + \dfrac {220}{2^{12}} + \dfrac {495}{2^{12}} = 0.00024414062 + 0.0029296875 + 0.01611328125 + 0.053710937 + 0.12084960937 = 0.19384765574 } $
Thus, as you could see, for X $\leq$ 33% of n, the probability decreases as n increases.
Note:¶
Take care trying more number of flips with above snippets. They are not efficient, and thus time sensitive. The nature of our problem is, the numbers grow exponentially, so the PC might have for long time, even for n greater than 15.
Problem Set 3 - Many Flips Ctd
# just a helper function for easier youtube call
def strip_url(url):
return url.replace('https://youtu.be/','')
from IPython.display import YouTubeVideo
url = 'https://youtu.be/Nq1p7MdxlBU'
YouTubeVideo(strip_url(url))
These are thankfully pretty straight forward, so we could directly calculate using Bayes' theorem.
4 Heads and 0 Tails¶
This means, 4 flips, and sequence HHHH
So question is what is the probability of fair coin, given HHHH?
$ p(Fair\ |\ Flips) = \dfrac {p(Fair\ \cap\ Flips)}{\sum p(Flips\ )} \\ \\ p(Fair\ |\ HHHH) = \dfrac {p(Fair\ \cap\ HHHH)}{\sum p(HHHH\ )} = \dfrac {p(Fair\ \cap\ HHHH)}{p(Fair\ \cap\ HHHH) + p(Loaded\ \cap\ HHHH)} = \dfrac {(0.9)(0.5)^4}{(0.9)(0.5)^4 + (0.1)(0.9)^4} = 0.4615 $
p_F = 0.9
p_HF = 0.5
p_L = 0.1
p_HL = 0.9
# 4 flips, 4 Heads and 0 Tails
p_F_given_H_4 = p_F*(p_HF)**4/(p_F*(p_HF)**4 + p_L*(p_HL**4))
p_F_given_H_4
10 Heads and 0 Tails¶
Similar like above, this time to the power of 10
$ p(Fair\ |\ Flips) = \dfrac {p(Fair\ \cap\ Flips)}{\sum p(Flips\ )} \\ \\ p(Fair\ |\ 10H) = \dfrac {p(Fair\ \cap\ 10H)}{\sum p(10H\ )} = \dfrac {p(Fair\ \cap\ 10H)}{p(Fair\ \cap\ 10H) + p(Loaded\ \cap\ 10H)} = \dfrac {(0.9)(0.5)^{10}}{(0.9)(0.5)^{10} + (0.1)(0.9)^{10}} = 0.02458 $
# 10 flips, 10 Heads and 0 Tails
p_F_given_H_10 = p_F*(p_HF)**10/(p_F*(p_HF)**10 + p_L*(p_HL**10))
p_F_given_H_10
20 Heads and 0 Tails¶
Similar like above, this time to the power of 20
$ p(Fair\ |\ Flips) = \dfrac {p(Fair\ \cap\ Flips)}{\sum p(Flips\ )} \\ \\ p(Fair\ |\ 20H) = \dfrac {p(Fair\ \cap\ 20H)}{\sum p(20H\ )} = \dfrac {p(Fair\ \cap\ 20H)}{p(Fair\ \cap\ 20H) + p(Loaded\ \cap\ 20H)} = \dfrac {(0.9)(0.5)^{20}}{(0.9)(0.5)^{20} + (0.1)(0.9)^{20}} = 0.0007059 $
# 20 flips, 20 Heads and 0 Tails
p_F_given_H_20 = p_F*(p_HF)**20/(p_F*(p_HF)**20 + p_L*(p_HL**20))
p_F_given_H_20
0 Heads and 5 Tails¶
Similar like above, except that its tails now.
Implied No of flips = 5
$ p(Fair\ |\ Flips) = \dfrac {p(Fair\ \cap\ Flips)}{\sum p(Flips\ )} \\ \\ p(Fair\ |\ 5T) = \dfrac {p(Fair\ \cap\ 5T)}{\sum p(5T\ )} = \dfrac {p(Fair\ \cap\ 5T)}{p(Fair\ \cap\ 5T) + p(Loaded\ \cap\ 5T)} = \dfrac {(0.9)(0.5)^{5}}{(0.9)(0.5)^{5} + (0.1)(0.1)^{5}} = 0.9999 \cong 1 $
Wow that is almost always probable. If you have 5 flips, fair coin having 5 T is almost certain?? Some one please double check this.
# 5 flips, 0 Heads and 5 Tails
p_TF = 0.5
p_TL = 0.1
p_F_given_T_5 = p_F*(p_TF**5)/( p_F*(p_TF**5) + p_L*(p_TL**5))
p_F_given_T_5
5 Flips, 2 Heads, 3 Tails¶
n = 5, X = 2 (heads), We could use Pascal's triangle, to find the no of sequences having 2 heads for 5 flip problem.
From Pascal's triangle, there are 10 sequences possible where, there are 2 Heads and 3 Tails.
For fair coin, each of those 10 sequences will have joint probability, $p(2H3T) = (0.5)^2(0.5)^3 = 0.5^5 = 0.03125 $
For loaded coin, each of those 10 sequences will have joint probability $p(2H3T) = (0.9)^2(0.1)^3 = 0.00081 $
Joint probability of coin being fair and having 2H3T for one sequence $ (0.9)(0.03125) = 0.028125 $
Joint probability of coin being loaded and having 2H3T for one sequence $ (0.1)(0.00081) = 0.000081 $
$ p(Fair\ |\ Flips) = \dfrac {p(Fair\ \cap\ Flips)}{\sum p(Flips\ )} \\ \\ p(Fair\ |\ 2H3T) = \dfrac {p(Fair\ \cap\ 2H3T)}{\sum p(2H3T\ )} = \dfrac {p(Fair\ \cap\ 2H3T)}{p(Fair\ \cap\ 2H3T) + p(Loaded\ \cap\ 2H3T)} = \dfrac { 10(0.028125) }{ 10(0.028125) + 10(0.000081)} = 0.9971 \cong 1 $
0.028125/(0.028125 + 0.000081)
Thus, answer is (probabilities which are < 0.5)
10 H 0 T
20 H 0 T
Problem Set 4 - Visualizing Many Flips
Visualizing Many Flips¶
Comprehending Many Flips Problem¶
Problem Set 2 Probability
section is a huge speed bump in Intro to Statistics - Udacity course, providing quizzes of high levels, which could be answered only with deeper knowledge of maths and also probability theory which is in strong contrast with flow of the course (which is irony because typical reader is new to it, that is why he is into the course in first place).
I have created below snippets to ease the pain. It provides functions to visualize and comprehend probability as no of flips increases. Hopefully it is useful to you.
Basic python is a pre requisite to understand the functions, but usage is simple and straight forward.
Visualize Probability Tree¶
from graphviz import Digraph
g = Digraph()
def draw_graph(g, n_flips=2):
"""
Given no of flips, this function creates a corresponding probability tree
"""
#g.attr(rankdir='LR', ranksep='0.5')
g.attr('node', shape='circle', fontsize='10')
g.attr('edge', fontsize='10')
g.node('Root','R',style='filled', fillcolor='#DCDCDC') # first node
i_outcome = 1
parent_list = []
for each_flip in range(1, n_flips+1):
n_outcomes = 2**each_flip
temp_list = []
p_index = 0 # parent index for each node
for each_outcome in range(0, int(n_outcomes/2)):
# draw nodes, record parents
new_H = 'H{}'.format(i_outcome)
new_T = 'T{}'.format(i_outcome)
g.node(new_H, 'H', style='filled', fillcolor='#FFFAAE')
g.node(new_T, 'T', style='filled', fillcolor='#D2FFFF')
i_outcome += 1
parents = parent_list[-1] if len(parent_list) > 0 else []
parent = parents[p_index] if len(parents) > 0 else None
# debug
#print('Flip:{} New H:{} New T:{} Parents:{} Parent Index:{}'.format(each_flip, new_H, new_T,list(parents), p_index))
#print('Flip:{} New H:{} New T:{} Parent:{}'.format(each_flip, new_H, new_T,parent))
# draw edges
if parent is not None:
g.edge(parent, new_H)
g.edge(parent, new_T)
else:
g.edge('Root', new_H)
g.edge('Root', new_T)
# for next set of H and T
p_index += 1
temp_list.append(new_H)
temp_list.append(new_T)
parent_list.append(temp_list)
print()
return g
draw_graph(g, n_flips=3)
Final Outcomes for any number of flips¶
import pandas as pd
def get_combinations(n_flips=2):
"""
Given the no of flips, this function will provide the final sequence combinations as a panda dataframe
"""
# setup data frame with necessary cols
columns = ['sequence', 'x']
df = pd.DataFrame(columns=columns)
# get the combinations
from itertools import product
for i in product(['H','T'], repeat=n_flips):
combi = ''.join(i)
n_H = combi.count('H') # no of heads
df = df.append({'sequence': combi, 'x': n_H }, ignore_index=True)
# get no of heads in the combinations
print('Given no of flips:', n_flips)
print('\nx = no of heads in respective sequence')
return df
combi_df = get_combinations(n_flips=3)
combi_df
Characteristics of Probability of Sequences¶
def get_combinations_consolidated(n_flips=2):
"""
Given the raw dataframe of combinations, this will provide n(x) and p(x)
"""
# setup data frame with necessary cols
columns = ['x', 'n(x)', 'p(x)']
df = pd.DataFrame(columns=columns)
# get raw data
combi_df = get_combinations(n_flips=n_flips)
x_list = combi_df['x'].tolist()
# extract frequency
#ref: https://stackoverflow.com/questions/2161752/how-to-count-the-frequency-of-the-elements-in-a-list/2162045
x_list.sort()
from itertools import groupby
freq_tuple = [ (key, len(list(group))) for key, group in groupby(x_list)]
#print(freq_tuple)
for each_freq_tuple in freq_tuple:
x = each_freq_tuple[0]
n_x = each_freq_tuple[1]
p_x = n_x/(2**n_flips) # its a conditional probability, thats y divided by total outcomes
df = df.append({'x': x, 'n(x)': n_x, 'p(x)': p_x }, ignore_index=True)
# convert cols to integer (except p(x))
df[['x','n(x)']] = df[['x','n(x)']].astype(int) #ref: https://stackoverflow.com/questions/21291259/convert-floats-to-ints-in-pandas/21291622
print('n(x) = total no of possible x type sequences')
print('for eg, if x = 2, n(x) = 3, then there are 3 possible sequence types, in each of which, no of heads is 2')
print('\np(x) = conditional probability that n(x) could occur out of all outcomes')
return df
final_df = get_combinations_consolidated(3)
final_df
import matplotlib.pyplot as plt
def autoformat(ax, xlabel, ylabel, fontsize):
"""
Few tweaks for better graph
"""
ax.set_xlabel(xlabel, fontsize=fontsize)
ax.set_ylabel(ylabel, fontsize=fontsize)
for label in (ax.get_xticklabels() + ax.get_yticklabels()):
label.set_fontsize(fontsize) # Size here overrides font_prop
ymin = ax.get_ylim()[0]
ymax = ax.get_ylim()[1]*1.1 # increase space to insert bar value
ax.set_ylim([ymin,ymax])
# x values should be integers as its no of heads
xmin = -1
xmax = ax.get_xlim()[1]
from math import ceil
xmaxint = ceil(xmax)+1
xint = range(xmin, xmaxint)
ax.set_xticks(xint)
def autolabel(ax, rects, fontsize):
"""
Attach a text label above each bar displaying its height
ref: https://matplotlib.org/2.0.2/examples/api/barchart_demo.html
"""
for rect in rects:
height = rect.get_height()
#text = '%.4f' % height
text = '{0: <{width}}'.format(height, width=1)
ax.text(rect.get_x() + rect.get_width()/2., 1.005*height,text, ha='center', va='bottom', fontsize=fontsize+3, color='red')
def plot_combinations_consolidated(df, fontsize=10):
"""
Given the dataframe with x, n(x), p(x) this provides two plots:
x vs n(x)
x vs p(x)
"""
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))
X = df['x'].tolist()
N = df['n(x)'].tolist()
P = df['p(x)'].tolist()
rects = ax1.bar(X, N)
autolabel(ax1, rects, fontsize)
xlabel = 'No of Heads'
ylabel = 'No of Sequences\nhaving those no of Heads'
autoformat(ax1, xlabel, ylabel, fontsize)
rects = ax2.bar(X, P)
autolabel(ax2, rects, fontsize)
ylabel = 'Probability of ANY of Sequences\nhaving those no of Heads'
autoformat(ax2, xlabel, ylabel, fontsize)
plt.tight_layout()
plt.subplots_adjust(wspace=0.45)
plt.show()
plot_combinations_consolidated(final_df)
Let us try again all for no of flips as 4¶
This time from library where we have moved these functions..
from coinflipviz import draw_graph, get_combinations, get_combinations_consolidated, plot_combinations_consolidated
n_flips = 4
g = Digraph()
g = draw_graph(g, n_flips)
g
combi_df = get_combinations(n_flips)
combi_df
final_df = get_combinations_consolidated(n_flips)
final_df
plot_combinations_consolidated(final_df)
Problem Set 5 - Visualizing Many Flips - Ctd
Visualizing Many Flips - Ctd¶
Objective¶
We have already seen coin flips, and how they relate to Pascal's triangle, how they evolve in to binomial distribution earlier in Problem Set 2 - Probability
section.
In 19A. Central Limit Theorem
section, Sebastian ups the game with tossing a die with 3 outcomes (1,2,3). While we could follow pattern to calculate results, visualizing how that outcomes would be could be helpful to strengthen our insights. So I have created another set of functions that could help us do that.
Note: This section focuses only on programming part that helps us visualize the outputs when number of outcomes vary. I will have another section dedicated to 19A. Central Limit Theorem
where I pull up these functions and illustrate the examples provided by Sebastian.
# just a helper function for easier youtube call
def strip_url(url):
return url.replace('https://youtu.be/','')
from IPython.display import YouTubeVideo
url = 'https://youtu.be/jajxhNBnbmI'
YouTubeVideo(strip_url(url))
Toss Definition¶
Suppose we have a dice, that produces 3 outcomes 0, 1, 2 when tossed. Suppose we toss it 2 times. The questions at hand are,
- What are the possible outcomes/sequences
- What are the number of occurences of sum of their individual outcomes?
- What are their probabilities (assuming equal probability for n outcomes as $\dfrac {1}{n}$)
What are the possible outcomes/sequences?¶
from graphviz import Digraph
import colorsys
import random
def HSVToHex(h, s, v):
(r, g, b) = colorsys.hsv_to_rgb(h, s, v)
hexy = "".join("%02X" % round(r*255)) + "".join("%02X" % round(g*255)) + "".join("%02X" % round(b*255))
return hexy
def getDistinctColors(n):
huePartition = 1.0 / (n + 1)
saturation = 0.2
lightness = 0.9
colors_list = [HSVToHex(huePartition * value, saturation, lightness) for value in range(0, n)]
return colors_list#, colors_comp_list
def draw_graph(g, n_outcomes = 2, n_flips=2):
"""
Given no of flips, this function creates a corresponding probability tree
For now, assuming outcomes are numbered
"""
#g.attr(rankdir='LR', ranksep='0.5')
g.attr('node', shape='circle', fontsize='10')
g.attr('edge', fontsize='10')
g.node('Root','R',style='filled', fillcolor='#DCDCDC') # first node
i_kids = 1
parent_list = []
# colors for each outcome
colors_list = getDistinctColors(n_outcomes)
#print(colors_list)
# for each flip
for each_flip in range(1, n_flips+1):
per_flip_outcomes = n_outcomes**(each_flip)
# for each node outcome of flip
temp_list = []
p_index = 0 # parent index for each node
for each_outcome in range(0, int(per_flip_outcomes/n_outcomes)):
# draw nodes, record parents
for each_kid in range(0, n_outcomes):
new_kid_ID = '{}'.format(i_kids) # Id the kid
new_kid_label = str(each_kid)
g.node(new_kid_ID, new_kid_label, style='filled', fillcolor='#{}'.format(colors_list[each_kid]), fontcolor='black')
parents = parent_list[-1] if len(parent_list) > 0 else []
parent = parents[p_index] if len(parents) > 0 else None
#debug
#print('Flip:{} Outcome:{} Kid\'s Label:{} Possible ID:{} Parent:{}'.format(each_flip, each_outcome, each_kid, i_kids, parent))
i_kids += 1
# draw edges
if parent is not None:
g.edge(parent, new_kid_ID)
else:
g.edge('Root', new_kid_ID)
# for next set of kids
temp_list.append(new_kid_ID)
p_index += 1
parent_list.append(temp_list)
print()
return g
g = Digraph(strict=True)
g = draw_graph(g, n_outcomes = 3, n_flips=1)
g
g = Digraph(strict=True)
g = draw_graph(g, n_outcomes = 3, n_flips=2)
g
Tabular View¶
Let X denote the total sum of any sequence after each toss/flip.
import pandas as pd
def sum_digits(n):
n = int(n)
s = 0
while n:
s += n % 10
n //= 10
return s
def get_combinations(n_outcomes = 2, n_flips=2):
"""
Given the no of flips, this function will provide the final sequence combinations as a panda dataframe
"""
# setup data frame with necessary cols
columns = ['sequence', 'x']
df = pd.DataFrame(columns=columns)
# generate the individual outcomes
outcomes = list(range(0,n_outcomes)) # so if 3, its 0,1,2
outcomes = [str(i) for i in outcomes] # convert all to string
# get the combinations
from itertools import product
for i in product(outcomes, repeat=n_flips):
combi = ''.join(i)
summy = sum([int(x) for x in i])
#print(i,combi, summy)
df = df.append({'sequence': combi, 'x': summy }, ignore_index=True)
# get no of heads in the combinations
#print('Given no of flips:', n_flips)
#print('\nx = no of heads in respective sequence')
return df
get_combinations(n_outcomes = 3, n_flips=2)
What are the number of occurences of sum of their individual outcomes? And their probabilities?¶
Now let us take the number of occurances for different values of X. For eg, n(X=2) implies, the number of sequences having sum of individual outcomes being 2
def get_combinations_consolidated(n_outcomes = 2, n_flips=2):
"""
Given the raw dataframe of combinations, this will provide n(x) and p(x)
"""
# setup data frame with necessary cols
columns = ['x', 'n(x)', 'p(x)']
df = pd.DataFrame(columns=columns)
# get raw data
combi_df = get_combinations(n_outcomes=n_outcomes, n_flips=n_flips)
x_list = combi_df['x'].tolist()
# extract frequency
#ref: https://stackoverflow.com/questions/2161752/how-to-count-the-frequency-of-the-elements-in-a-list/2162045
x_list.sort()
from itertools import groupby
freq_tuple = [ (key, len(list(group))) for key, group in groupby(x_list)]
#print(freq_tuple)
for each_freq_tuple in freq_tuple:
x = each_freq_tuple[0]
n_x = each_freq_tuple[1]
p_x = n_x/(n_outcomes**n_flips) # its a conditional probability, thats y divided by total outcomes
df = df.append({'x': x, 'n(x)': n_x, 'p(x)': p_x }, ignore_index=True)
# convert cols to integer (except p(x))
df[['x','n(x)']] = df[['x','n(x)']].astype(int) #ref: https://stackoverflow.com/questions/21291259/convert-floats-to-ints-in-pandas/21291622
#print('n(x) = total no of possible x type sequences')
#print('for eg, if x = 2, n(x) = 3, then there are 3 possible sequence types, in each of which, no of heads is 2')
#print('\np(x) = conditional probability that n(x) could occur out of all outcomes')
return df
final_df = get_combinations_consolidated(n_outcomes = 3, n_flips=2)
final_df
import matplotlib.pyplot as plt
def autoformat(ax, xlabel, ylabel, fontsize):
"""
Few tweaks for better graph
"""
ax.set_xlabel(xlabel, fontsize=fontsize)
ax.set_ylabel(ylabel, fontsize=fontsize)
for label in (ax.get_xticklabels() + ax.get_yticklabels()):
label.set_fontsize(fontsize) # Size here overrides font_prop
ymin = ax.get_ylim()[0]
ymax = ax.get_ylim()[1]*1.1 # increase space to insert bar value
ax.set_ylim([ymin,ymax])
# x values should be integers as its no of heads
xmin = -1
xmax = ax.get_xlim()[1]
from math import ceil
xmaxint = ceil(xmax)+1
xint = range(xmin, xmaxint)
ax.set_xticks(xint)
def autolabel(ax, rects, fontsize):
"""
Attach a text label above each bar displaying its height
ref: https://matplotlib.org/2.0.2/examples/api/barchart_demo.html
"""
for rect in rects:
height = rect.get_height()
#text = '%.4f' % height
text = '{0: <{width}}'.format(height, width=1)
ax.text(rect.get_x() + rect.get_width()/2., 1.005*height,text, ha='center', va='bottom', fontsize=fontsize+3, color='red')
def plot_combinations_consolidated(df, fontsize=10):
"""
Given the dataframe with x, n(x), p(x) this provides two plots:
x vs n(x)
x vs p(x)
"""
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,5))
X = df['x'].tolist()
N = df['n(x)'].tolist()
P = df['p(x)'].tolist()
rects = ax1.bar(X, N)
autolabel(ax1, rects, fontsize)
xlabel = 'Sum of each outcome in a Sequence'
ylabel = 'No of Sequences\nhaving that Sum'
autoformat(ax1, xlabel, ylabel, fontsize)
rects = ax2.bar(X, P)
autolabel(ax2, rects, fontsize)
ylabel = 'Probability of ANY of Sequences\nhaving that Sum'
autoformat(ax2, xlabel, ylabel, fontsize)
plt.tight_layout()
plt.subplots_adjust(wspace=0.45)
plt.show()
plot_combinations_consolidated(final_df)
Tests..¶
g = Digraph(strict=True)
n_outcomes = 3
n_flips = 2
g = draw_graph(g, n_outcomes = n_outcomes, n_flips=n_flips)
g
final_df = get_combinations_consolidated(n_outcomes = n_outcomes, n_flips=n_flips)
plot_combinations_consolidated(final_df)